About

This is a project using ‘R’ language developed by Felipe Solares da Silva. This is part of his professional portfolio and if you want to see more projects like this, go and check my portfolio at https://github.com/fsolares/professional-portfolio.

Contact:

Acknowledgment

Thank you Jaques D’Erasmo (https://github.com/Jaquesd), old friend and also Data Science student for your immeasurable contribution on this project, all your feedback, code sujection and support during my path gave me the strength to overcome this challenge. Congratualation for us, that was an ammazing and real team work.

Versions

Zika Virus Interactive Map

Project Purpose

Build interactive graphs in order to map Zika virus occurences in Brazilian territory.

Step 1 - Importing Essential Packages and Modules

If you don’t have any of these packages already installed in your rstudio please, run the code below!

chooseCRANmirror(graphics=FALSE, ind=1) # No need to run this part, 
                                        # it's a R Markdown correction.

packs <- c('forcats', 'plotly', 'RColorBrewer', 'tmap', 'rnaturalearthdata',
           'sf','ggplot2','dplyr')
for(p in packs){
  install.packages(p)
}

If you have some of this packages, please run the code below giving the name of the package that are missing for you!

chooseCRANmirror(graphics=FALSE, ind=1) # No need to run this part, 
                                        # it's a R Markdown correction.

install.packages('your missing package')

Loading Essential Packages

After the packages are installed, make sure to run the code below to import the modules and make the required connections.

chooseCRANmirror(graphics=FALSE, ind=1) # No need to run this part, 
                                        # it's a R Markdown correction.

lapply(packs, require, character.only = T)

Step 2 - The Data

For this project, we gather information from PAHO - Pan American Health Organization (http://www.paho.org/data/index.php/en/?option=com_content&view=article&id=532&Itemid=) to build our ow data sets. PAHO organizations is the specialized international health agency for the Americas. It works with countries throughout the region to improve and protect people’s health. PAHO engages in technical cooperation with its member countries to fight communicable and noncommunicable diseases and their causes, to strengthen health systems, and to respond to emergencies and disasters. They provide data in different formats aquired from Health Ministries of the countries through Health Information Platfform for the Americas (PLISA). After lots of cleaning and transforming, we structure the data and store it into csv files for our futher analysis.

Loading the Data

df1 <- read.csv("W_Casos_semanales_crosstab.csv", header = T, sep = ',', stringsAsFactors = F)
df2 <- read.csv("W_Casos_semanales_crosstab2017.csv", header = T, sep = ',', stringsAsFactors = F)
df3 <- read.csv("total_incidence.csv", header = T, sep = ',', stringsAsFactors = F)
df4 <- read.csv("total_cases.csv", header = T, sep = ',', stringsAsFactors = F)
df5 <- read.csv("linegraph_2018vs2017.csv", header = T, sep = ',', stringsAsFactors = F)
df6 <- read.csv("bargraph_2018vs2017.csv", header = T, sep = ',', stringsAsFactors = F)

Organizing Data for Mapping

First, we’re going to convert a Spatial Polygons data frame to a sf (simple feature) object using st_as_sf. The data frame in question is the states50, from rnaturalearthdata package that brings state (admin level 1) polygons For Australia, Brazil, Canada and USA, at 1:50m (medium) resolution.

states <- st_as_sf(states50)

Next, we’re going to use dplyr functions to organize and prepare a new set for plotting.

brazil <- states %>% 
  filter(admin == 'Brazil') %>% 
  select(state = name, geometry) %>% 
  arrange(state)


brazil$state <- df4$state
colnames(df3)[colnames(df3) == 'State'] <- 'state'
  

brazil <- brazil %>% 
  left_join(df3, by = 'state') %>% 
  left_join(df4, by = 'state') %>% 
  select(state, Total.Incidence = Total.Cumulative.Cases.Incidence, Total.Cases = total.cases)

Step 3 - Creating an Interactive Map

Tmapping Zika Virus Incidence per State 2018

Incidence or incidence coefficient measures the rate of manifestation of a particular disease. Is calculated using the number of likely new cases divided by the population of a given geographical area, and expressed per 100 thousand inhabitants. Source: https://www.diferenca.com/incidencia-e-prevalencia/

tmap_mode("view")
tm_shape(brazil) + 
  tm_polygons("Total.Incidence", title = "Total Incidence") + 
  tm_scale_bar()

Tmapping Zika Virus Cases per State 2018

Prevalence or Cases measures the number of ocurrences of a disease in a population over a specific period of time.

tm_shape(brazil) + 
  tm_polygons("Total.Cases", title = "Total Cases", breaks = c(0,15000,50000,75000,150000)) + 
  tm_scale_bar()

Step 4 - Creating Interactive Charts

Comparing 2018 with 2017 using geom_line

# Line plot
df5$label <- as.factor(df5$label)

plot <- df5 %>%
  ggplot(aes(x = weeks, y = totalperweek)) +
  geom_line(aes(colour = label), size = 1.3) +
  geom_point(aes(colour = label), size = 0.8) +
  theme(plot.title = element_text(face = "bold"), 
        plot.caption = element_text(face = "bold"),
        panel.background = element_blank(),
        axis.text.x = element_text(size = 7),
        axis.title = element_text(face = "bold",size = 12),
        axis.line.x = element_line(colour = "black", 
                                   size=1, 
                                   lineend = "butt"),
        axis.line.y = element_line(colour = "black", 
                                   size=1, 
                                   lineend = "butt"),
        legend.title = element_text(size=10, color = "black", face="bold"),
        legend.position= c(0.85, 0.5),
        legend.background = element_blank(),
        legend.key = element_blank()) +
  labs(title="Weekly Cases", subtitle="2018 vs 2017",
       caption = 'Source: PAHO - Pan American Health Organization', 
       y="Total Cases/Week", x="Epidemiological Weeks", color = 'Years') +
  scale_x_continuous(breaks = c(1:52), expand=c(0.009, 0))

ggplotly(plot) 

Comparing 2018 with 2017 using geom_bar

# Bar plot

df6$year <- as.factor(df6$year)

plot2 <- df6 %>% 
  ggplot(aes(x = state, y = cumulative.total)) +
  geom_bar(aes(fill = year), stat = 'identity') +
  theme(plot.title = element_text(face = "bold"), 
        plot.caption = element_text(face = "bold"),
        panel.background = element_blank(),
        axis.title.y = element_text(face = "bold", size = 12, vjust = 3),
        axis.title.x = element_blank(),
        axis.line.x = element_line(colour = "black", 
                                   size=1, 
                                   lineend = "butt"),
        axis.line.y = element_line(colour = "black", 
                                   size=1, 
                                   lineend = "butt"),
        axis.text.x=element_text(size=10, angle = 90,
                                 hjust=1),
        legend.title = element_text(size=10, color = "black", face="bold"),
        legend.position= c(0.85, 0.7),
        legend.background = element_blank(),
        legend.key = element_blank()) +
  labs(title="Yearly Prevalence", subtitle="2018 vs 2017",
       caption = 'Source: PAHO - Pan American Health Organization', 
       y="Total Cases/State", fill = 'Years') + 
  scale_fill_brewer(palette = "Set1")

ggplotly(plot2)

Plotting Cumulative Incidence over 2018

df3$state <- as.factor(df3$state)

plot3 <- df3 %>% 
  mutate(state = fct_reorder(state, Weekly.Cases.Incidence, .desc = T)) %>% 
  ggplot(aes(x = state, y = Weekly.Cases.Incidence)) +
  geom_bar(stat = 'identity',width = 0.8, fill = "#E69F00") +
  geom_text(aes(label = Weekly.Cases.Incidence), position = position_dodge(width = 1), 
            size = 3.5, vjust = -1)+
  theme(plot.title = element_text(face = "bold", vjust = 2), 
        plot.caption = element_text(face = "bold"),
        plot.subtitle = element_text(vjust = 3),
        panel.background = element_blank(),
        axis.title.y = element_text(face = "bold", size = 12, vjust = 3),
        axis.title.x = element_blank(),
        
        axis.line.x = element_line(colour = "black", 
                                   size=1, 
                                   lineend = "butt"),
        axis.line.y = element_line(colour = "black", 
                                   size=1, 
                                   lineend = "butt"),
        axis.text.x = element_text(size=10, angle = 90,
                                 hjust=1),
        legend.title = element_text(size=10, color = "black", face="bold"),
        legend.position = c(0.85, 0.7),
        legend.background = element_blank(),
        legend.key = element_blank()) +
  labs(title="Cumulative Incidence", subtitle="per State",
       caption = 'Source: PAHO - Pan American Health Organization', 
       y="Cumulative Incidence")
  

ggplotly(plot3, tooltip = 'all') %>% 
  style(hoverinfo ='none')